28 research outputs found

    Speeding up RDF aggregate discovery through sampling

    Get PDF
    International audienceRDF graphs can be large and complex; finding out interesting information within them is challenging. One easy method for users to discover such graphs is to be shown interesting aggregates (un-der the form of two-dimensional graphs, i.e., bar charts), where interestingness is evaluated through statistics criteria. Dagger [5] pioneered this approach, however its is quite inefficient, in particular due to the need to evaluate numerous, expensive aggregation queries. In this work, we describe Dagger + , which builds upon Dagger and leverages sampling to speed up the evaluation of potentially interesting aggregates. We show that Dagger + achieves very significant execution time reductions, while reaching results very close to those of the original, less efficient system

    Semi-automatic support for evolving functional dependencies

    Get PDF
    During the life of a database, systematic and frequent violations of a given constraint may suggest that the represented reality is changing and thus the constraint should evolve with it. In this paper we propose a method and a tool to (i) find the functional dependencies that are violated by the current data, and (ii) support their evolution when it is necessary to update them. The method relies on the use of confidence, as a measure that is associated with each dependency and allows us to understand \u201dhow far\u201d the dependency is from correctly describing the current data; and of goodness, as a measure of balance between the data satisfying the antecedent of the dependency and those satisfying its consequent. Our method compares favorably with literature that approaches the same problem in a different way, and performs effectively and efficiently as shown by our tests on both real and synthetic databases

    Learning from, Understanding, and Supporting DevOps Artifacts for Docker

    Full text link
    With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80% via a technique we call phased parsing. To address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.Comment: Published in ICSE'202

    Process conformance checking by relaxing data dependencies

    Get PDF
    Given the events modeled by a business process, it may happen in the presence of alternative execution paths that the data required by a certain event determines somehow what event is executed next. Then, the process can be modeled by using an approximate functional dependency between the data required by both events. We apply this approach in the context of conformance checking: given a business process model with a functional dependency (FD) that no longer corresponds to the observed reality, we propose corrections to the FD to make it exact or at least to improve its confidence and produce a more accurate model.Peer ReviewedPostprint (published version

    Extraction, Sentiment Analysis and Visualization of Massive Public Messages

    No full text
    This paper describes the design and implementation of tools to extract, analyze and explore an arbitrarily great amount of public messages from diverse sources. The aim of our work is to flexibly support sentiment analysis by quickly adapting to different use cases, languages, and message sources. First, a highly parallel scraper has been implemented, allowing the user to customize the behavior with scripting technologies and thus being able to manage dynamically loaded content. Then, a novel framework is developed to support agile programming, building and validating a classifier for sentiment analysis. Finally, a web application allows the real-time selection and projection of the analysis results in different dimensions in an OLAP fashion

    Data Mining for XML Query-Answering Support

    No full text

    A declarative extension of horn clauses, and its significance for datalog and its applications

    No full text
    FS-rules provide a powerful monotonic extension for Horn clauses that supports monotonic aggregates in recursion by reasoning on the multiplicity of occurrences satisfying existential goals. The least fixpoint semantics, and its equivalent least model semantics, hold for logic programs with FS-rules; moreover, generalized notions of stratification and stable models are easily derived when negated goals are allowed. Finally, the generalization of techniques such as seminaive fixpoint and magic sets, make possible the efficient implementation of DatalogFS, i.e., Datalog with rules with Frequency Support (FS-rules) and stratified negation. A large number of applications that could not be supported efficiently, or could not be expressed at all in stratified Datalog can now be easily expressed and efficiently supported in DatalogFS and a powerful DatalogFS system is now being developed at UCLA. Copyright © 2013 [MIRJANA MAZURAN, EDOARDO SERRA and CARLO ZANIOLO]
    corecore